second phase
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Asia > China > Shanghai > Shanghai (0.04)
- Asia > China > Beijing > Beijing (0.04)
- Research Report > New Finding (0.93)
- Research Report > Experimental Study (0.93)
- Leisure & Entertainment (0.46)
- Information Technology (0.45)
- Health & Medicine (0.45)
- North America > United States > California > San Diego County > San Diego (0.04)
- Asia > Japan > Honshū > Tōhoku > Fukushima Prefecture > Fukushima (0.04)
- North America > Canada > Quebec > Montreal (0.04)
Large Stepsize Gradient Descent for Non-Homogeneous Two-Layer Networks: Margin Improvement and Fast Optimization
The typical training of neural networks using large stepsize gradient descent (GD) under the logistic loss often involves two distinct phases, where the empirical risk oscillates in the first phase but decreases monotonically in the second phase. We investigate this phenomenon in two-layer networks that satisfy a near-homogeneity condition. We show that the second phase begins once the empirical risk falls below a certain threshold, dependent on the stepsize. Additionally, we show that the normalized margin grows nearly monotonically in the second phase, demonstrating an implicit bias of GD in training non-homogeneous predictors. If the dataset is linearly separable and the derivative of the activation function is bounded away from zero, we show that the average empirical risk decreases, implying that the first phase must stop in finite steps. Finally, we demonstrate that by choosing a suitably large stepsize, GD that undergoes this phase transition is more efficient than GD that monotonically decreases the risk. Our analysis applies to networks of any width, beyond the well-known neural tangent kernel and mean-field regimes.
Observation-Free Attacks on Online Learning to Rank
Chattopadhyay, Sameep, Karamchandani, Nikhil, Moharir, Sharayu
Online learning to rank (OLTR) plays a critical role in information retrieval and machine learning systems, with a wide range of applications in search engines and content recommenders. However, despite their extensive adoption, the susceptibility of OLTR algorithms to coordinated adversarial attacks remains poorly understood. In this work, we present a novel framework for attacking some of the widely used OLTR algorithms. Our framework is designed to promote a set of target items so that they appear in the list of top-K recommendations for T - o(T) rounds, while simultaneously inducing linear regret in the learning algorithm. We propose two novel attack strategies: CascadeOFA for CascadeUCB1 and PBMOFA for PBM-UCB . We provide theoretical guarantees showing that both strategies require only O(log T) manipulations to succeed. Additionally, we supplement our theoretical analysis with empirical results on real-world data.
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Switzerland (0.04)
- Education > Educational Setting > Online (0.61)
- Information Technology > Security & Privacy (0.56)
- Government > Military (0.56)
NeoDictaBERT: Pushing the Frontier of BERT models for Hebrew
Shmidman, Shaltiel, Shmidman, Avi, Koppel, Moshe
Since their initial release, BERT models have demonstrated exceptional performance on a variety of tasks, despite their relatively small size (BERT-base has ~100M parameters). Nevertheless, the architectural choices used in these models are outdated compared to newer transformer-based models such as Llama3 and Qwen3. In recent months, several architectures have been proposed to close this gap. ModernBERT and NeoBERT both show strong improvements on English benchmarks and significantly extend the supported context window. Following their successes, we introduce NeoDictaBERT and NeoDictaBERT-bilingual: BERT-style models trained using the same architecture as NeoBERT, with a dedicated focus on Hebrew texts. These models outperform existing ones on almost all Hebrew benchmarks and provide a strong foundation for downstream tasks. Notably, the NeoDictaBERT-bilingual model shows strong results on retrieval tasks, outperforming other multilingual models of similar size. In this paper, we describe the training process and report results across various benchmarks. We release the models to the community as part of our goal to advance research and development in Hebrew NLP.
- Europe > Austria > Vienna (0.14)
- Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
- North America > United States > New Mexico > Santa Fe County > Santa Fe (0.04)
- (3 more...)
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Asia > China > Shanghai > Shanghai (0.04)
- Asia > China > Beijing > Beijing (0.04)
- Research Report > New Finding (0.93)
- Research Report > Experimental Study (0.93)
- Leisure & Entertainment (0.46)
- Information Technology (0.45)
- Health & Medicine (0.45)
- North America > United States > California > San Diego County > San Diego (0.04)
- Asia > Japan > Honshū > Tōhoku > Fukushima Prefecture > Fukushima (0.04)
- North America > Canada > Quebec > Montreal (0.04)